4 research outputs found

    A Taxonomy for Mining and Classifying Privacy Requirements in Issue Reports

    Full text link
    Digital and physical footprints are a trail of user activities collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increasing of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing privacy regulations and policies only provide high-level principles which are difficult for software engineers to design and implement privacy-aware systems. In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on two well-established and widely-adopted privacy regulations and frameworks, the General Data Protection Regulation (GDPR) and the ISO/IEC 29100. These requirements are refined into a level that is implementable and easy to understand by software engineers, thus supporting them to attend to existing regulations and standards. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study.Comment: Submitted to IEEE Transactions on Software Engineering on 23 December 202

    On Privacy Weaknesses and Vulnerabilities in Software Systems

    No full text
    In this digital era, our privacy is under constant threat as our personal data and traceable online/offline activities are frequently collected, processed and transferred by many software applications. Privacy attacks are often formed by exploiting vulnerabilities found in those software applications. The Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) systems are currently the main sources that software engineers rely on for understanding and preventing publicly disclosed software vulnerabilities. However, our study on all 922 weaknesses in the CWE and 156,537 vulnerabilities registered in the CVE to date has found a very small coverage of privacy-related vulnerabilities in both systems, only 4.45% in CWE and 0.1% in CVE. These also cover only a small number of areas of privacy threats that have been raised in existing privacy software engineering research, privacy regulations and frameworks, and relevant reputable organisations. The actionable insights generated from our study led to the introduction of 11 new common privacy weaknesses to supplement the CWE system, making it become a source for both security and privacy vulnerabilities

    An empirical study of automated privacy requirements classification in issue reports

    No full text
    The recent advent of data protection laws and regulations has emerged to protect privacy and personal information of individuals. As the cases of privacy breaches and vulnerabilities are rapidly increasing, people are aware and more concerned about their privacy. These bring a significant attention to software development teams to address privacy concerns in developing software applications. As today’s software development adopts an agile, issue-driven approach, issues in an issue tracking system become a centralised pool that gathers new requirements, requests for modification and all the tasks of the software project. Hence, establishing an alignment between those issues and privacy requirements is an important step in developing privacy-aware software systems. This alignment also facilitates privacy compliance checking which may be required as an underlying part of regulations for organisations. However, manually establishing those alignments is labour intensive and time consuming. In this paper, we explore a wide range of machine learning and natural language processing techniques which can automatically classify privacy requirements in issue reports. We employ six popular techniques namely Bag-of-Words (BoW), N-gram Inverse Document Frequency (N-gram IDF), Term Frequency-Inverse Document Frequency (TF-IDF), Word2Vec, Convolutional Neural Network (CNN) and Bidirectional Encoder Representations from Transformers (BERT) to perform the classification on privacy-related issue reports in Google Chrome and Moodle projects. The evaluation showed that BoW, N-gram IDF, TF-IDF and Word2Vec techniques are suitable for classifying privacy requirements in those issue reports. In addition, N-gram IDF is the best performer in both projects

    A taxonomy for mining and classifying privacy requirements in issue reports

    No full text
    Context: Digital and physical trails of user activities are collected over the use of software applications and systems. As software becomes ubiquitous, protecting user privacy has become challenging. With the increase of user privacy awareness and advent of privacy regulations and policies, there is an emerging need to implement software systems that enhance the protection of personal data processing. However, existing data protection and privacy regulations provide key principles in high-level, making it difficult for software engineers to design and implement privacy-aware systems. Objective: In this paper, we develop a taxonomy that provides a comprehensive set of privacy requirements based on four well-established personal data protection regulations and privacy frameworks, the General Data Protection Regulation (GDPR), ISO/IEC 29100, Thailand Personal Data Protection Act (Thailand PDPA) and Asia-Pacific Economic Cooperation (APEC) privacy framework. Methods: These requirements are extracted, refined and classified (using the goal-based requirements analysis method) into a level that can be used to map with issue reports. We have also performed a study on how two large open-source software projects (Google Chrome and Moodle) address the privacy requirements in our taxonomy through mining their issue reports. Results: The paper discusses how the collected issues were classified, and presents the findings and insights generated from our study. Conclusion: Mining and classifying privacy requirements in issue reports can help organisations be aware of their state of compliance by identifying privacy requirements that have not been addressed in their software projects. The taxonomy can also trace back to regulations, standards and frameworks that the software projects have not complied with based on the identified privacy requirements
    corecore